303

DOI: 10.1201/9781003355205-8

C h a p t e r 8

Shotgun Metagenomic

Data Analysis

8.1  INTRODUCTION

In the previous chapter, we discussed the amplicon-based metagenomic data analysis which

is based on the profiling of a single targeted gene, usually 16S rRNA gene in environmental

or clinical samples. Many researchers debate that approach is not metagenomic in nature

because it focuses only on a single gene rather than the entire genomes of the microbes

in the samples. In this chapter, we will discuss the shotgun sequencing metagenomic

approach which involves the sequencing of the entire genomes of the microbes in the sam-

ples, and therefore, it provides more insights onto the microbial communities, their genetic

profiling, and their impact on hosts and association to the host phenotype. The shotgun

sequencing for the metagenomes is rather new but it also emerged as a consequence of the

progress in the high-throughput sequencing technologies, which was also followed by the

progress in the development of the computational resources and tools that are capable to

handle the massiveness and complexity of the metagenomic sequencing data. The shotgun

whole-genome metagenomic sequencing and data analysis are now used to quantify the

microbial communities and diversity, to assemble novel microbial genomes, to identify

new microbial taxa and genes, and to determine the metabolic pathways orchestrated by

the microbial community and more.

The metagenomic raw data produced by a high-throughput sequencer is originated

from either environmental or clinical samples that contain multiple microbial organisms,

including bacteria, fungi, and viruses. Data originated from samples recovered from a

host may be contaminated with the host genomic sequences. Multiple samples can also

be sequenced in a single run (multiplexing). In the multiplex sequencing, unique barcode

sequences identifying each sample are ligated to the DNA fragments in the DNA library

preparation step. Some library preparation kits allow multiplexing of hundreds of samples.

Illumina has multiple kits for library preparation, including Illumina DNA Prep, (M) tag-

mentation, which uses bead-linked transposomes in the tagmentation process to randomly